Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: fix discovery shutdown (#8725) #8735

Merged
merged 1 commit into from
Nov 17, 2023
Merged

p2p: fix discovery shutdown (#8725) #8735

merged 1 commit into from
Nov 17, 2023

Conversation

battlmonstr
Copy link
Contributor

Problem:
Some goroutines are blocked on shutdown:

  1. table close <-tab.closed // because table loop pending
  2. table loop <-refreshDone // because lookup shutdown blocks doRefresh
  3. lookup shutdown <-it.replyCh // because it.queryfunc (findnode - ensureBond) is blocked, and not returning errClosed (if it returns and pushes to it.replyCh, then shutdown() will unblock)
  4. findnode - ensureBond <-rm.errc // because the related replyMatcher was added after loop() exited, so there's nothing to push errClosed and unlock it

If addReplyMatcher channel is buffered, it is possible that UDPv4.pending() adds a new reply matcher after closeCtx.Done().
Such reply matcher's errc result channel will never be updated, because the UDPv4.loop() has exited at this point. Subsequent discovery operations will deadlock.

Solution:
Revert to an unbuffered channel.

Problem:
Some goroutines are blocked on shutdown:
1. table close <-tab.closed // because table loop pending
1. table loop <-refreshDone // because lookup shutdown blocks doRefresh
1. lookup shutdown <-it.replyCh // because it.queryfunc (findnode - ensureBond) is blocked, and not returning errClosed (if it returns and pushes to it.replyCh, then shutdown() will unblock)
1. findnode - ensureBond <-rm.errc // because the related replyMatcher was added after loop() exited, so there's nothing to push errClosed and unlock it

If addReplyMatcher channel is buffered, it is possible that UDPv4.pending()
adds a new reply matcher after closeCtx.Done().
Such reply matcher's errc result channel will never be updated,
because the UDPv4.loop() has exited at this point.
Subsequent discovery operations will deadlock.

Solution:
Revert to an unbuffered channel.
@battlmonstr battlmonstr requested a review from mh0lt November 15, 2023 14:29
@yperbasis yperbasis linked an issue Nov 15, 2023 that may be closed by this pull request
Copy link
Contributor

@mh0lt mh0lt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This channel needs a buffer, or certainly it did because some call sequences deadlock otherwise. Is it causing shutdown to hang ?

I think that we need to add an additional check rather than just not buffering the channel. I'll take a look at the logging it seems to me that we should add a status check.

Looking at the code think this fix is what is needed:

func (t *UDPv4) pending(id enode.ID, ip net.IP, port int, ptype byte, callback replyMatchFunc) *replyMatcher {
	ch := make(chan error, 1)
	p := &replyMatcher{from: id, ip: ip, port: port, ptype: ptype, callback: callback, errc: ch}

	if t.closeCtx.Err() != nil {  <-- we need to check if the contexts is previously closed before starting to wait
		ch <- errClosed
	} else {
		select {
		case t.addReplyMatcher <- p:
			// loop will handle it
		case <-t.closeCtx.Done():
			ch <- errClosed
		}
	}

	return p
}

@AskAlexSharov AskAlexSharov merged commit 3ca7fdf into devel Nov 17, 2023
@AskAlexSharov AskAlexSharov deleted the pr/8725 branch November 17, 2023 02:13
battlmonstr added a commit that referenced this pull request Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Erigon doesn't shutdown gracefully
3 participants